petiteFinder: an automated computer vision tool to compute Petite colony frequencies in baker’s yeast

Background Mitochondrial respiration is central to cellular and organismal health in eukaryotes. In baker’s yeast, however, respiration is dispensable under fermentation conditions. Because yeast are tolerant of this mitochondrial dysfunction, yeast are widely used by biologists as a model organism to ask a variety of questions about the integrity of mitochondrial respiration. Fortunately, baker’s yeast also display a visually identifiable Petite colony phenotype that indicates when cells are incapable of respiration. Petite colonies are smaller than their Grande (wild-type) counterparts, and their frequency can be used to infer the integrity of mitochondrial respiration in populations of cells. Unfortunately, the computation of Petite colony frequencies currently relies on laborious manual colony counting methods which limit both experimental throughput and reproducibility. Results To address these problems, we introduce a deep learning enabled tool, petiteFinder, that increases the throughput of the Petite frequency assay. This automated computer vision tool detects Grande and Petite colonies and computes Petite colony frequencies from scanned images of Petri dishes. It achieves accuracy comparable to human annotation but at up to 100 times the speed and outperforms semi-supervised Grande/Petite colony classification approaches. Combined with the detailed experimental protocols we provide, we believe this study can serve as a foundation to standardize this assay. Finally, we comment on how Petite colony detection as a computer vision problem highlights ongoing difficulties with small object detection in existing object detection architectures. Conclusion Colony detection with petiteFinder results in high accuracy Petite and Grande detection in images in a completely automated fashion. It addresses issues in scalability and reproducibility of the Petite colony assay which currently relies on manual colony counting. By constructing this tool and providing details of experimental conditions, we hope this study will enable larger-scale experiments that rely on Petite colony frequencies to infer mitochondrial function in yeast. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-023-05168-5.

80% validation sets and 20% test sets. See Figure A.1 for a summary of this dataset. Figure A.1: An overview of the labeled dataset for the semi-supervised approach. Petri dishes of synthetic complete media and Grande/Petite yeast colonies were placed into a 3D printed insert and scanned on a computer scanner bottom-up (leftmost image). Individual plate images were cropped out of the large image, and 59 of these images were annotated using the LabelMe annotation tool [1]. Grande and Petite colonies are indicated by green and red bounding boxes, respectively. Experimental variation in media preparation/pouring of plates produces diffuse low quality scans and sharper plate images as shown by the upper and lower plate in the middle panel. Grandes. This clustering method was selected because there was no implicit assumption of equal cluster sizes, and it was sufficiently sensitive to outliers which is important for plate images with low/high Petite frequencies.
We evaluate the effectiveness of this classification scheme on all 59 plate images by considering the precision and recall of Petite classification ( Figure A.2c). The precision is T P T P +F P , where T P are the true positive classification counts and F P are the false positive counts. The recall is T P T P +F N , where F N stands for the false negative counts. The precision and recall for Petites across all 59 images is 0.933 and 0.999, respectively. The Grande precision and recall is 0.999 and 0.912, respectively. These metrics suggest that the classification scheme struggles with false positives Petite classifications, which are due to Grande colonies being misclassified as Petite colonies. In Figure A.2d we show the Petite frequencies predicted per plate according to this scheme compared to the ground truth from annotated data. It highlights that while we have good average Petite classification accuracy (average deviation of 0.038 in Petite frequency, standard deviation of 0.058), plates where Grande colonies are overwhelmingly classified as Petite can produce significant deviations from ground truth measurements. Overall, however, these results were promising and suggested that this type of classification scheme combined with a colony detection pipeline may perform reasonably well. We discuss the construction of this joint colony detection and classification pipeline in the next section.

A.2.2 Combining colony detection and classification
In order to create an automated computer vision pipeline that accepts plate images and computes Petite frequencies we needed to combine methods to detect colonies and the previously explored classification scheme. To detect colonies, we opted to use a thresholding approach to separate colonies which are brighter in images from their darker uniform background on the agar surface ( Figure A.3). To start, we apply a Hough transform [2] to identify the plate edge with an estimated plate radii range provided by the user. An example plate edge detection is shown in red in the upper right panel of Figure A.3. This allows us to mask out the exterior of the plate and restrict the colony detection to real ones on the agar surface. This is followed by Otsu's thresholding [3], which is an algorithm that selects a foreground/background intensity threshold by finding the value that minimizes the intensity variation within each class. Given maximum (args.max) and minimum (args.min) expected colony sizes provided by the user, we then apply a watershed transform [4] with a minimum separation between minima of watershed basins of args.max 4π to focus on separating The Petite frequencies computed per plate in the labeled data using this clustering approach as well as the ground truth petite frequency determined through human annotation. Average and standard deviation in prediction error is also denoted, where prediction error is the difference between ground truth and predicted petite frequencies per image.
merged Grande colonies. This is followed by the removal of colonies with sizes above args.max, below args.min, and with eccentricity > 0.9. Filtering out only high eccentricity objects we found removes fabric fibres from images, but retains segmented merged colonies that don't necessarily have an eccentricity near zero. Finally, segmented regions are clustered according to the previous section assuming that they are colonies. A Hough transform accepts a user suggested radius parameter (to account for various plate and image sizes) and outputs the predicted plate boundary in red. A mask is applied to the exterior of the plate, followed by Otsu's thresholding to segment the image into foreground (white), and background (black). Given a maximum (args.max) and minimum (args.min) expected colony size provided by the user, a watershed transform is applied to the foreground with a minimum displacement between basins of args.max 4π to separate only the merging of large colonies which is most common. Colonies with size above args.max, below args.min, and with eccentricity > 0.9 are removed. Finally, the labeled foreground objects are clustered using average-linkage clustering in size and intensity features of the labeled region.

A.2.3 Evaluation of the combined colony detection and classification pipeline
To evaluate how effective this complete Petite classification pipeline is we must first address three important points: The first is that args.min and args.max, the expected minimum and maximum colony sizes, must be selected for the set of data we are testing this pipeline on. We do this by testing numerous ranges of these parameters for the "non-ideal" and "ideal" validation image sets and picking the parameter values that maximize performance by computing the F 1 score ( 2×precision×recall precision+recall ). The maximum value of this score indicates the best result across all parameter regimes if precision and recall are equally weighted. With this approach the pipeline becomes semi-supervised, which is similar to users inputting these maximum and minimum colony values per image or batch of images and is common in modern colony detection approaches [5,6].
The second point is that to compare predicted bounding boxes (the rectangles that bound unique segmented regions from the pipeline), and ground truth bounding boxes from human labeling, we have to consider the overlap of these rectangles. The most common approach is to compute the intersection area over union (IOU) of the predicted and ground truth bounding boxes and provide an IOU threshold above which we consider a good prediction. As will be discussed later, the thresholding approach we took in the pipeline often reduces the size of Petite colonies due to their transparency and therefore similarity to the background. Thus, we choose an IOU threshold of 0.1 above which we consider bounding boxes to overlap to account for the shrinking of Petite colonies after thresholding.
The third point is that to really evaluate the accuracy of this now semi-supervised approach we individually tune colony size parameters in the validation image sets, which is similar to users selecting maximum/minimum sizes per image, but test the predictive value of the pipeline on separate test images.
As a demonstration of the colony size parameter tuning, we show the precision and recall for both Grande and Petite colonies in the "ideal" validation image set (31 images) in Figure  per image, which indicate that in all but one image we are overestimating Petite frequency. This suggests that we are either classifying dust as Petites, Grandes as Petites, or selectively removing Grandes due to the maximum colony size parameter.
We summarize the effectiveness of this colony detection pipeline in Table A.1, where we include optimal maximum/minimum colony sizes inferred from the "ideal" and "non-ideal" image validation sets and precision and recall in the corresponding test sets. As can be seen from this table, the optimal colony size thresholds vary significantly due to experimental variation and so does the classification accuracy of this approach. This suggests that while we see reasonable Petite/Grande classification especially in the "ideal" test dataset using our approach, parameters like minimum and maximum sizes really do need to be tuned per image or batch of images to obtain the best performance if we are relying heavily on size as a feature. This can be remedied by an automated  In this section we summarize the results in Appendix A, which highlight the dependence of a low complexity semi-supervised model on user input to make good predictions. The performance of classical segmentation approaches (intensity thresholding, edge detection, watershed, to name a few), are highly dependent on experimental conditions. Furthermore, following segmentation, distinguishing colonies from dust or artifacts requires a characterization of their size, shape, colour, and texture. Unfortunately, all of these properties can change based on the imaging setup (resolution, imaging device, orientation), the types of colonies being grown, and other experimental conditions like media type. As we showed in Figure A.5, performing a low complexity model parameter sweep to find optimal parameters on a validation set, then applying this to a test set, results in insufficient performance for our purposes on a diverse experimental dataset. This insufficient performance is a result of sub-optimal model complexity, and is generally remedied by having users visually tune parameters per image, as is the case for OpenCFU [7], AutoCellSeg [5], and CellProfiler [8]. For example, user input is required for plate edge detection in the case of CellProfiler (or a hand drawn mask), and user input on example colony locations in AutoCellSeg. All of these methods also require a maximum and minimum colony size and thresholds on other properties like eccentricity to eliminate outliers. Furthermore, given segmentations from these types of approaches, users then have to apply their own classification methods which are also accompanied by more tunable parameters.
To address this issue, in the main-text we built a model with sufficient complexity to learn the relationships between colonies and background that captures a wide range of experimental conditions and doesn't require user input to perform well.

B.2 Failures of segmentation and classification
In this section we provide example images to highlight common scenarios where semi-supervised colony detection approaches fail.
We    The red curves are the predictions and the black curves are manual counting. The gray envelope is the binomial sampling error (ground truth ± standard deviation), assuming that Petite production is a Bernoulli process with a probability equal to the ground truth frequency when sampling 78 colonies per plate image. Furthermore, this realization can be leveraged to improve performance.
In this exploration we sourced an alternative dataset by accessing publicly available images from DOI: 10.5281/zenodo.3779863, the raw images used in DOI: 10.1242/bio.052936. These images of yeast colonies on Petri dishes differ from ours in four important ways: 1. Different yeast strain -The yeast in this alternative dataset are Schizosaccharomyces pombe, which grow 2 times more slowly than the S. cerevisiae we used. They also are Petite negative, meaning that Petite colonies do not survive. However, S. pombe cells with aneuploidy often exhibit smaller, more irregular colonies among wild-type, which can serve as a 'Small' colony class, although they won't have the same size distributions or intensity characteristics as true Petites. Wild-type colonies we will call 'Large'.
2. Different growth times -Yeast colonies in this dataset were grown for 5-7 days at 30 o C, which impacts media opacity as well as colony size.
3. Different media -Yeast were grown in YES media, which will have different opacity and colour from the media we used in experiments.
4. Different imaging techniques -Colonies were imaged top-down, using a camera instead of a flatbed scanner, with a lighter background.
Overall, these differing experimental conditions yielded smaller colonies (all colonies were 50% smaller than our Grande colonies relative to image size). There was also no obvious intensity difference observed between the 'Small' and 'Large' colonies, unlike the Petite and Grande colonies in our assay. The latter effect is possibly a result of similar cell density between 'Large' and 'Small' colonies, but also may be related to light being reflected off of these colonies and not transmitted through them as it is during image scanning.
In Figure C.1a we show the default output of petiteFinder on an image from this alternative dataset. Segmentation accuracy is remarkably good, as it is in our test dataset. A clear issue, however, is that most colonies are classified as being 'Small' (orange), and rarely 'Large' (blue), whereas to our untrained eyes the 'Small' colonies are less frequent than what is shown here. This isn't particularly surprising, as the relative size of the largest colonies are 50% smaller than the Grande colonies in our dataset that were used to train the model. Furthermore, we have no expectation that in this media the size distributions of 'Small' colonies are as widely separated as Petite and Grande colonies in our media. The translucency of colonies also do not differ substantially, which the model likely uses in its discrimination between Grande and Petite colonies on our media.
Given this evidence of Petite colony bias (which would become Grande bias in the case of larger colonies), we added a user input parameter to petiteFinder to improve this perceived poor classification accuracy. In the case of widely different colony sizes from the training dataset, an assumption in the existing image slice computation is violated; we previously assumed that when resolution changes and physical colony size does not, image slices can be computed to maintain the colony area to background ratio from the training data. This is not the case when colony size extends beyond the size distribution of the augmented training data. To address this, users can provide an optional --grande size parameter during inference that accepts the approximate diameter of a Grande colony in pixels. This allows us to compute a slice size that maintains the relative size of the colonies in the training data. We can do so by computing image slices as S W = S H = GD I GDt × 512. In this equation, GD I represents the typical diameter of a Grande or 'Large' colony provided by a user, GD t the typical Grande diameter from the training data (= 50 pixels in our data), and 512 is the size in pixels of the training image slice width or height. This equation serves to enforce the same Grande size to slice size ratio as the training data.
With this small change, the model output on the same image from the alternative dataset is shown in C.1b. While we aren't confident in the characteristics that delineate 'Large' and 'Small' colonies in this dataset (and the dataset is not annotated), this change has shifted the classifications towards